Application of Bloom filters in Fast Packet Classification

 

Milind K. Chavan1*, B. S. Agarkar2

1P.G. Student, S.R.E.S College of Engineering, Kopargaon, University of Pune, Pune

2Professor, Dept. E&Tc Engg. S.R.E.S’ College of Engineering, Kopargaon, University of Pune, Pune

*Corresponding Author: milindkc@gmail.com, bagarkar@yahoo.co.in

  

ABSTRACT:

Packet classification finds various applications in computer networks like QoS, Firewalls, multimedia communication, telecommunication, security, monitoring data traffic etc. To classify packets to a particular flow or the set of flows, Intermediate nodes which are present in the network must perform search for a rule which defines the flow of that particular packet which is chosen on the basis of different field present in the data packet. The rule set is predefined by the user, which is constructed on the basis of algorithmic and architectural methodology. The major constraint in this methodology is searching speed of particular rule. Since few decades researches are finding the best computational methodology for packet classification. But the current algorithms which are used in the packet classification highly rely on expensive and high power consuming devices like TCAM (Ternary content addressable memory). Therefore searching of fast and power efficient algorithms for packet classification is still the subject of interest for researchers.

 

In this paper we have delivered a new direction to packet classification which includes algorithmic and architectural structure for packet classification. Our inception is from well-known Cross product algorithm which is very fast but introduce additional rules which increases memory requirement. We have shown how to enhance the crossproduct in a way which drastically reduces this addition of extra rules, without affecting the throughput of the algorithm, unnecessary memory access to the off chip memory are avoided by filtering them through on chip bloom filter.

 

For packets that matches p rules in   a rule set, our algorithm requires only P+4+ε memory access to find all the matching rules. Where ε <<1 which is a constant that depends on small false positive rate of Bloom filter. Using two SRAM chips search speed of 38 million packets per second can be achieved. For the rule set size of few hundred to thousand rules an average rules set expansion factor is 1.2 to 1.4 and average memory consumption is 32 to 45 bytes.

 

KEY WORDS: Packet classification, TCAM, crossproduct, bloom filter, SRAM.

 


INTRODUCTION:

Packet classification is becoming favorite topic of researchers in last few decades as the demand of large data in communication is increasing day by day, thus we require more and more sophisticated algorithmic techniques to fulfill this demand. Logically packet classification technique is nothing but comparing the bit stream given in the different fields in data packets with the classifier which consist of a rule set. The comparison is done with the prefix bits and not with the whole bit stream. After the matched rule is found the respective action is applied on the packet which is defined in the classifier.

 

However till this date none of the computational techniques is not able to eliminate TCAM in real life application. TCAMs are the storing devices that stores the array of limited width key. This key is used to search the rule in parallel and produces the result when any entry matches with the key. Recent TCAMs supports up to 133 million searches per second for 144 bit wide key and can be able to store 128K keys that are 144 bit wide [6] TCAM devices are costly and consumes 50 times much power than other devices but they are still favorite choice of manufacturer. They are also 15 times more bulky than SRAM.

 

In this paper we are implementing a new logarithmic method i.e. Crossproduct algorithm. In this algorithm a single data structure is created for a single and number of data structure are constructed as per the no of field to be checked according to the classifier. But the drawback is it create more amount of additional rules. Which require significantly large amount of memory. So to avoid this unnecessary usage of memory by additional rules, multiple subsets of data structure (trie) are created, which drastically reduces the unnecessary usage of memory. In cross producting multiple subset algorithm the results are used to form a key, which is used to lookup in lookup table to find matched rule.

 

To achieve this we will first look up the prefix for each field separately using bloom filter which is fast and memory efficient searching technique. Therefore, with a very high probability the longest prefix matching can be performed on the source and destination addresses and the source and destination port in just four memory accesses.

 

To reduce the memory consumption we have divided the rules into multiple subsets and then constructed cross product lookup table. As the rules are distributed into no. of sets, we need to perform lookup in each subset for which we can use Bloom filter. This computational technique will avoid the unnecessary lookups in those subset which do not match the prefix address.

 

To reduce memory requirement, we divide the rule in multiple subset and then construct a cross product table for each subset. This will reduce the requirement of additional rules in cross product. As we divided the rules in no of subsets and then constructed a cross product table. Thus, we require lookup in each subset. However we can use bloom filter to avoid this extra lookup in the subset which are not having matching rule, which helps to get high through put from this algorithm. If multiple rules are matched then we will require only 4 access to choose the highest priority rule where P is the number of rules that packet can match.

 

LITERATURE SURVEY:

Packet classification is the very important element in computer network. As the demand of digital data is increasing day by day, the speed of transmission and reception of digital data is also increasing. In order to process this huge amount of digital data a sophisticated technique should be used. An excellent survey and classification is given on exiting packet classification algorithm in [13].Here we will discuss only the algorithm which are close to our algorithm.

 

The basic idea for packet classification algorithm is perfectly explain in [6] which enlight the knowledge of packet classification with a very good and simple example of real life classifier. In this paper Pankaj Guptan  and Mc Keown has given an example network connected two enterprise network and two internet service provider across a network access point. By using this examples they have explained application of packet classification. They have also classified the basic techniques in 4 types

1) Basic data structure.

2) Geometric algorithms.

3) Heristics based.

4) Hardware based.

Which provide exact look up performance is related to basic cross product algorithm [11].

 

The basic concept of cross product algorithm is that, we have to perform LPM on each field first and then combine the results of individual to form a key which mapped towards crossproduct table. This best matching rule from the cross product table can be fetched in only one memory access (cycle). The single field look up as in the RFC algorithm [6]. The BV [7] and ABV [3] algorithm uses bit vector intersection to replace the cross product look up TCAM are largely used in packet classification.

 

TCAMs are widely used for packet classification. Latest TCAM devices also include the banking mechanism to reduce the power consumption by selectively turning off the unused banks. Traditionally, TCAM devices needed to expand the range values into prefixes for storing a rule with range specifications. The recently introduced algorithm, DIRPE [7], uses a clever technique to encode ranges differently which results in overall lesser rule expansion compared to the traditional method. The authors also recognized that in modern security applications, it is not sufficient to stop the matching process after the first match is found but all the matching rules for a packet must be reported. They devised a multi-match scheme with TCAMs which involves multiple TCAM accesses.

 

In paper [6] the authors have done longest prefix matching using bloom filter to get the required rule for the packet, in this paper  bloom filters are created on the bases of count of bit present in the destination IP address. The bloom filter is made like: for 1 bit prefix. 1bit bloom filter is programmed for 2bit prefix two bit length bloom filter is programmed and so on. After matching the 1,2,3,…..n numbers of prefixes one by one the proportional data is collected, if 1bit prefix is matched with 1bit bloom filter then filter will generate 1bit at its output, if it do matched then it will generate zero at the output thus 1,2,3………n no. of prefixes are matched with IP address. Even a single bit in output string of bloom filter is 0 then it will discard the packet. If matched then, computation is performed on the output string which is called hashing. After hashing the identity of particular rule is return.

 

Deficient cross product algorithm

Deficient cross product algorithm work as follows. First, the separate trie is constructed on the bases of fields which are represented in the rule set. In this trie each node is marked with the prefix involve in the rule. Let the first trie for the field, search and second trie for field 2 search. The connection between the marked nodes is nothing but the rule. At start we perform independent search for each field in the respective individual trie and find the most specific prefix .i.e. the longest matching prefix (LPM). After this we create a unique key and use it to index the cross product rule table. Every rule in cross product table is original rule on an artificial rule which we generated during crossproducting, as the cross producting is multiplicative in nature. This rules either forms matching rule or do not form any rule. Hence on matching we either get nothing i.e. no match or match. Thus when there is match present, we always gets the correct rule. This is shown in Figure 1.

 

Table 1 Basic Classifier Table

 

r1

1*

*

r2

1*

00*

r3

01*

100*

r4

101*

11*

r5

101*

11*

r6

00*

0

 

 

Figure 1 Illustration of basic cross product algorithm

 

Here we shown two dimensional rule set where each field is 4 bit wide for the purpose of demonstration.

 

TABLE 2 Representation of cross product table

 

1*

*

r1

 

1*

00*

r2

p1

1*

11*

r1

p1

1*

100*

r1

 

00*

*

r6

p3

00*

00*

r6

p4

00*

11*

r6

p5

00*

100*

r6

 

01*

*

 

 

01*

00*

 

 

01*

11*

 

 

01*

100*

r3

p6

101*

*

r1

p7

101*

00*

r1 r2

 

101*

11*

r5

 

101*

100*

r4

These crossproduct algorithm has two deficiencies.

1) A large no of empty rule.

2) A very large no of pseudo rule.

 

 

The first problem is eliminated by using hash table. Instead of using direct look up table,as the cross product table maintain all the possibility that are generated by cross producting, so that it can index directly to the rule set. Thus maintaining a hash table is the best way. There are empty rule also, which can be compressed by this method.

 

Above all this if we use Bloom filter described in above topic before Hash table it drastically improves the throughput. Because it require only one memory access per LPM.

 

Therefore entire classification process takes 5 memory access with very high probability to classify a packet.

 

Figure 3 Representaion of Pseudo Rule and Original Rule

 

Multi Subset Cross Producing Algorithm

In the deficient crossproduct algorithm to get list of matching rule we required only one hash table access. However if we allow our self to use multiple hash table accesses then we can split the single subset into multiple smaller rule set and taking cross produdcting between them. By this method the pseudo rule get reduce significantly as compared to deficient cross producting algorithm. This is shown in figure 4.

 

Figure 4 dividing rule in separate subset Fig. (a)(b)&(c) to reduce overlap and LPM Tables 3(b) & Tables 3(b) for individual field.

 

Table 3(a)

 

G1

G2

G3

1*

1

1

-

00*

-

-

2

01*

-

2

-

101*

3

1

-

 

Table3(b)

 

G1

G2

G3

*

-

0

0

00*

2

0

0

11*

2

0

0

100*

3

3

0

 

 

We divided the rule set into three subset and within those subset we has taken the crossproducting and inserted those rule which are provided in ruleset this results inserting P7 pseudo rule in subset 1 (G1) andP2 in subset 2(G2) all the pseudo rule vanishes and the load of the extra rule reduces significantlyl. Now one question arises in our mind how this. Pseudo rule vanishes? This is because the cross product is by default multiplicative in nature. When the no. of overlapping prefixes of a field i get reduced by a factor of xi due to partisioning the resulting reduction in the cross product rule is of the order πxi and here large. After the reduction of the required memory by the cross productivity, an independent hash table can be prepaired in which for each independent rule in subset, independent look can be performed. This splitting insert two extra memory access.

 

1) An entire LPM process is perform for all subset.

2) A separate access is required to look up final from hash table.

 

 

Explanation to the Flow of Algorithm

After splitting the rule set into multiple subset LPM is done on an each subset, this will generate the key which is place in the LPM table for particular field. So, LPM is done on each subset and a key is obtained. Key is nothing but a no. of prefixes matched in the subset if no. prefix is matched in subset, then it will simply take the key as zero. So in practice we can directly skip this subset and move to next subset. But for the purpose of analysis we will consider all the matching prefix key as non-zero. After probing this rule directly in hash table we get multiple unnecessary rules. This problem is avoided by using bloom filter. We maintain one bloom filter in chip memory corresponding to each off chip rule subset hash table. We first tests the bloom filters with the key to be looked up in the subset. If the filter shows the match subset, then It will simply take the key as zero. So in practice we can directly skip this subset and move to next subset. But for the purpose of analysis we will consider all the matching prefix key as non-zero. After probing this rule directly in hash table we get multiple unnecessary rules. This problem is avoided by using bloom filter. We maintain one bloom filter in chip memory corresponding to each off chip rule subset hash table. We first tests the bloom filters with the key to be looked up in the subset. If the filter shows the match. We took the longest matching prefix (up the key in the off chip hash table the flow of algorithm is shown in figure 5.

 

 

 

RESULTS:

Our algorithm will classifies a packet in only 4+p+€, where p is the no of rules a given packet can match, is the small constant that gives false positive match of bloom filter. Our algorithm is also much power efficient. It consumes an average 32 to 45 bytes per rule of memory. Hence rule set are large as 128k  which can easily supported in less than 5 MB of SRAM. Using two 300 MHz, 36 bit wide SRAM chips packet can be classified at the rate of 38 million packets per second.


 

Figure 5 Illustration of flow of algorithm

 

 


CONCLUSION:

Algorithmic solution are always a better alternative for TCAM for lower cost, less power consumption and flexibility. Our computational methodology includes multi set crossproducting, which are much better than only crossproducting with insertion of bloom filter which accelerates computational process of packet cassification. Our algorithm will classify packets in only 4+p+ ε, where p is the no of rules a given packet can match, ε is the small constant that gives false positive match of bloom filter. Our algorithm is also much power efficient. It consumes an average 32 to 45 bytes per rule of memory. Hence rule set are large as 128k   can easily supported in less than 5 MB of SRAM. Using two 300 MHz, 36 bit wide SRAM chips packet can be classified at the rate of 38 million packets per second.

 

REFERENCES:

[1]     IDT Generic Part: 71P72604. http://www.idt.com/?catID=58745 &genID=71P72604.

[2]      IDT Generic Part: 75K72100. http://www.idt.com/?catID=58523 &genID=75K72100.

[3]     Florin Baboescu and George Varghese. Scalable Packet Classification. In ACM SIGCOMM, 2001.

[4]     Sarang Dharmapurikar, P. Krishnamurthy, and Dave Taylor. Longest Prefix Matching using Bloom Filters. In ACM SIGCOMM, August 2003.

[5]     Will Eatherton. Fast IP Lookup Using Tree Bitmap. Washington University Master Thesis, 1999.

[6]     Pankaj Gupta and Nick McKeown. Packet classification on multiple fields. In ACM SIGCOMM, 1999.

[7]     T. V. Lakshman and D. Stiliadis. High-speed policy-based packet forwarding using efficient multi-dimensional range matching.In ACM SIGCOMM, 1998.

[8]     K. Lakshminarayanan, Anand Rangarajan, and Srinivasan Venkatachary. Algorithms for Advanced Packet Classification using Ternary CAM. In ACM SIGCOMM, 2005.

[9]     Haoyu Song, Sarang Dharmarpurikar, Jonathan Turner, and John Lockwood. Fast Hash Table Lookup Using Extended Bloom.

[10]   Filter: An Aid to Network Processing. In ACM SIGCOMM, 2005.

[11]   V. Srinivasan, Subhash Suri, and George Varghese. Packet Classification Using Tuple Space Search. In ACM SIGCOMM, 1999.V. Srinivasan, George Varghese, Subhash Suri, and Marcel Waldvogel. Fast and Scalable Layer Four Switching. In ACM SIGCOMM, 1998.

[12]   David Taylor and Jon Turner. Scalable Packet Classification Using Distributed Crossproducting of Field Labels. In IEEE INFOCOM, July 2005.

[13]   David E. Taylor. Survey and taxonomy of packet classification techniques. Washington University Technical Report, WUCSE-2004, 2004.

[14]   David E. Taylor and Jonathan S. Turner. Classbench: A Packet Classification Benchmark. In IEEE INFOCOM, 2005.

[15]   Fang Yu and Randy H. Katz. Efficient Multi-Match Packet Classification with TCAM. In IEEE Hot Interconnects, August 2003.

[16]   Balasaheb S. Agarkar, Uday V. Kulkarni “A Novel Technique for Fast Packet Classification”. International Journal of Computer Applications (0975 – 8887), Volume 76– No.4, August 2013.

[17]   Fang Yu, T. V. Lakshman, Martin Austin Motoyama, and Randy        H. Katz. Ssa: a power and memory efficient scheme to multi-match packet classification. In ANCS ’05: Proceedings of the 2005 symposium on Architecture for networking and communications systems, 2005

 

 

Received on 27.05.2014                             Accepted on 14.06.2014        

©A&V Publications all right reserved

Research J. Engineering and Tech. 5(2): April- June 2014 page 77-82